Parsing di Corpora di Apprendenti di Italiano: un Primo Studio su VALICO (Parsing Italian Learner Corpora: a Case Study on VALICO)
نویسندگان
چکیده
English. Modern learner corpora are now routinely PoS tagged, whereas syntactic parsing is much less frequent. This paper proposes a first attempt of parsing applied to a subcorpus of VALICO, in an effort to identify key elements to be further used to parse corpora of Italian as a foreign language in
منابع مشابه
Studio sull'Ordine dei Costituenti nel Confronto tra Generi e Complessità (Analysis of Constituents Order Across Textual Genres and Complexity)
Italiano. In questo articolo presentiamo uno studio sull’ordine dei costituenti in italiano basato su corpora annotati in maniera automatica fino all’analisi sintattica a dipendenze. L’indagine comparativa ha permesso di valutare l’influenza sia del genere testuale sia della complessità linguistica nella distribuzione dei fenomeni di marcatezza sintattica. English. In this paper we present a st...
متن کاملGeneralization in Native Language Identification: Learners versus Scientists
English. Native Language Identification (NLI) is the task of recognizing an author’s native language from text in another language. In this paper, we consider three English learner corpora and one new, presumably more difficult, scientific corpus. We find that the scientific corpus is only about as hard to model as a less-controlled learner corpus, but cannot profit as much from corpus combinat...
متن کاملBuilding a Social Media Adapted PoS Tagger Using FlexTag -- A Case Study on Italian Tweets
English. We present a detailed description of our submission to the PoSTWITA shared-task for PoS tagging of Italian social media text. We train a model based on FlexTag using only the provided training data and external resources like word clusters and a PoS dictionary which are build from publicly available Italian corpora. We find that this minimal adaptation strategy, which already worked we...
متن کاملTree Kernels-based Discriminative Reranker for Italian Constituency Parsers
English. This paper aims at filling the gap between the accuracy of Italian and English constituency parsing: firstly, we adapt the Bllip parser, i.e., the most accurate constituency parser for English, also known as Charniak parser, for Italian and trained it on the Turin University Treebank (TUT). Secondly, we design a parse reranker based on Support Vector Machines using tree kernels, where ...
متن کاملDealing with Italian Adjectives in Noun Phrase: a Study Oriented to Natural Language Generation
English. This paper describes a theoretical and empirical investigation about the position of adjectives in the Italian language. The long term goal which oriented the study is the formalization of this information into a natural language generation system. Providing that adjectives mainly occur within noun phrases, we focused on them and we collected data from corpora representing very differe...
متن کامل